Neural Models for Part-Whole Hierarchies

نویسندگان

  • Maximilian Riesenhuber
  • Peter Dayan
چکیده

We present a connectionist method for representing images that explicitly addresses their hierarchical nature. It blends data from neuroscience about whole-object viewpoint sensitive cells in inferotemporal cortex and attentional basiseld modulation in V4 with ideas about hierarchical descriptions based on microfeatures. The resulting model makes critical use of bottom-up and top-down pathways for analysis and synthesis. We illustrate the model with a simple example of representing information about faces. 1 Hierarchical Models Images of objects constitute an important paradigm case of a representational hierarchy, in which `wholes', such as faces, consist of `parts', such as eyes, noses and mouths. The representation and manipulation of part-whole hierarchical information in xed hardware is a heavy millstone around connectionist necks, and has consequently been the inspiration for many interesting proposals, such as Pollack's RAAM. We turned to the primate visual system for clues. Anterior inferotemporal cortex (IT) appears to construct representations of visually presented objects. Mouths and faces are both objects, and so require fully elaborated representations, presumably at the level of anterior IT, probably using di erent (or possibly partially overlapping) sets of cells. The natural way to represent the part-whole relationship between mouths and faces is to have a neuronal hierarchy, with connections bottom-up from the mouth units to the face units so that information about the mouth can be used to help recognize or analyze the image of a face, and connections top-down from the face units to the mouth units expressing the generative or synthetic knowledge that if there is a face in a scene, then there is (usually) a mouth, too. There is little We thank Larry Abbott, Geo Hinton, Bruno Olshausen, Tomaso Poggio, Alex Pouget, Emilio Salinas and Pawan Sinha for discussions and comments. empirical support for or against such a neuronal hierarchy, but it seems extremely unlikely on the grounds that arranging for one with the correct set of levels for all classes of objects seems to be impossible. There is recent evidence that activities of cells in intermediate areas in the visual processing hierarchy (such as V4) are in uenced by the locus of visual attention. This suggests an alternative strategy for representing part-whole information, in which there is an interaction, subject to attentional control, between top-down generative and bottom-up recognition processing. In one version of our example, activating units in IT that represent a particular face leads, through the top-down generative model, to a pattern of activity in lower areas that is closely related to the pattern of activity that would be seen when the entire face is viewed. This activation in the lower areas in turn provides bottom-up input to the recognition system. In the bottom-up direction, the attentional signal controls which aspects of that activation are actually processed, for example, specifying that only the activity re ecting the lower part of the face should be recognized. In this case, the mouth units in IT can then recognize this restricted pattern of activity as being a particular sort of mouth. Therefore, we have provided a way by which the visual system can represent the part-whole relationship between faces and mouths. This describes just one of many possibilities. For instance, attentional control could be mainly active during the top-down phase instead. Then it would create in V1 (or indeed in intermediate areas) just the activity corresponding to the lower portion of the face in the rst place. Also the focus of attention need not be so ineluctably spatial. The overall scheme is based on an hierarchical top-down synthesis and bottom-up analysis model for visual processing, as in the Helmholtz machine (note that \hierarchy" here refers to a processing hierarchy rather than the part-whole hierarchy discussed above) with a synthetic model forming the e ective map: `object' `attentional eye-position' ! `image' (1) (shown in cartoon form in gure 1) where `image' stands in for the (probabilities over the) activities of units at various levels in the system that would be caused by seeing the aspect of the `object' selected by placing the focus and scale of attention appropriately. We use this generative model during synthesis in the way described above to traverse the hierarchical description of any particular image. We use the statistical inverse of the synthetic model as the way of analyzing images to determine what objects they depict. This inversion process is clearly also sensitive to the attentional eye-position { it actually determines not only the nature of the object in the scene, but also the way that it is depicted (i.e., its instantiation parameters) as re ected in the attentional eye position. In particular, the bottom-up analysis model exists in the connections leading to the 2D viewpoint-selective image cells in IT reported by Logothetis et al. which form population codes for all the represented images (mouths, noses, etc.). The top-down synthesis model exists in the connections leading in the reverse direction. In generalizations of our scheme, it may, of course, not be necessary to generate an image all the way down in V1. The map (1) speci es a top-down computational task very like the bottom-up one addressed using a multiplicatively controlled synaptic matrix in the shifter model

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mapping Part-Whole Hierarchies into Connectionist Networks

Three different ways of mapping part-whole hierarchies into connectionist networks are described. The simplest scheme uses a fixed mapping and is inadequate for most tasks because it fails to share units and connections between different pieces of the part-whole hierarchy. Two alternative schemes are described, each of which involves a different method of time-sharing connections and units. The...

متن کامل

Aggregation from Multiple Perspectives by Roles

Lars Kirkegaard B kdal Bent Bruun Kristensen The Maersk Mc-Kinney Moller Institute for Production Technology University of Southern Denmark/Odense University, DK-5230 Odense M, Denmark e-mail: flkb, [email protected] Abstract A whole is aggregated from parts, where a part can be aggregated itself or be atomic. The structure forms an aggregation (or part-whole) hierarchy. The whole can be...

متن کامل

Development of an in-cylinder processes model of a CVVT gasoline engine using artificial neural network

Today, employing model based design approach in powertrain development is being paid more attention. Precise, meanwhile fast to run models are required for applying model based techniques in powertrain control design and engine calibration. In this paper, an in-cylinder process model of a CVVT gasoline engine is developed to be employed in extended mean valve control oriented model and also mod...

متن کامل

Application of Artificial Neural Networks in a Two-step Classification for Acute Lymphocytic Leukemia Diagnosis by Blood Lamella Images

Introduction: This study aimed to present a system based on intelligent models that can enhance the accuracy of diagnostic systems for acute leukemia. The three parts including preprocessing, feature extraction, and classification network are considered as associated series of actions. Therefore, any dysfunction or poor accuracy in each part might lead in general dysfunction of...

متن کامل

Daily Pan Evaporation Estimation Using Artificial Neural Network-based Models

Accurate estimation of evaporation is important for design, planning and operation of water systems. In arid zones where water resources are scarce, the estimation of this loss becomes more interesting in the planning and management of irrigation practices. This paper investigates the ability of artificial neural networks (ANNs) technique to improve the accuracy of daily evaporation estimation....

متن کامل

Reconstruction of the neural network model of motor control for virtual C.elegans on the basis of actual organism information

Introduction: C. elegans neural network is a good sample for neural networks studies, because its structural details are completely determined. In this study, the virtual neural network of this worm that was proposed by Suzuki et al. for control of movement was reconstructed by adding newly discovered synapses for each of these network neurons. These synapses are newly discovered in the actu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996